Method and apparatus for performing video image decoding

ABSTRACT

In one embodiment, a method of performing video image decoding includes the following. A compressed video image is downsampled in the frequency domain. The downsampled video image is inverse transformed. Motion compensation for the downsampled image is performed in the spatial domain.

BACKGROUND

[0001] 1. Field

[0002] This disclosure is related to video processing and, moreparticularly, to decoding video images.

[0003] 2. Background

[0004] Due to implementation constraints, motion compensation hardwareemployed in video decoding is typically designed for a given video imageresolution. For example, without limitation, the MPEG2 specification,ISO/IEC 13818-2 MPEG-2 Video Coding Standard, “Information technologyGeneric coding of moving pictures and associated audio information:Video,” March, 1995, hereinafter referred to as “MPEG2,” may impose avideo resolution of 720 pixels times 480 pixels at 30 frames per second.In a conventional design, the engine that performs the decoding willtypically only generate images at the video resolution that thecompressed video bit stream specifies. As a result of the amount ofmemory employed to hold decoded images, higher resolution compressedvideo bit streams, such as MPEG2 bit streams for digital television(DTV) content, for example, will not run on such a system. If sufficientmemory is available to decode at the full specified resolution, and auser chooses to view the video on a smaller window on a computerplatform, for example, downscaling is performed on the full size decodedimage at display time and, therefore, full resolution decoding is stillemployed. A need, therefore, exists for a method or technique for asystem to operate on or produce video resolutions other than theresolutions specified by the compressed video bit stream providing thevideo or image data.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The subject matter regarded as the invention is particularlypointed out and distinctly claimed in the concluding portion of thespecification. The invention, however, both as to organization andmethod of operation, together with objects, features and advantagesthereof, may best be understood by reference to the following detaileddescription when read with the accompanying drawings in which:

[0006]FIG. 1 is a block diagram illustrating an embodiment of aconventional pipeline for decoding compressed digital video images;

[0007]FIG. 2 is a block diagram illustrating an embodiment of anapparatus for performing video image decoding in accordance with thepresent invention;

[0008]FIG. 3 is a schematic diagram illustrating the result on amacroblock of applying an embodiment of a method for performing videodecoding in accordance with the present invention;

[0009]FIG. 4 is a schematic diagram illustrating the result on amacroblock of applying another embodiment of a method for performingvideo decoding in accordance with the present invention;

[0010]FIG. 5 is a schematic diagram illustrating the result on amacroblock of applying yet another embodiment of a method for performingvideo decoding in accordance with the present invention;

[0011]FIG. 6 is a schematic diagram illustrating the result on amacroblock of applying still another embodiment of a method forperforming video decoding in accordance with the present invention;

[0012]FIG. 7 is a block diagram illustrating an embodiment of hardwarethat may be employed to implement an embodiment of a method forperforming video decoding in accordance with the present invention;

[0013]FIG. 8 illustrates FIG. 7 with an overlay to schematicallyillustrate correspondence with an embodiment of a three-dimensional (3D)rendering pipeline;

[0014]FIG. 9 is a schematic diagram illustrating the pixel operation ofa bilinear interpolation, such as may be employed in an embodiment of amethod for performing video decoding in accordance with the presentinvention;

[0015]FIG. 10 is a block diagram illustrating an embodiment of abilinear interpolator such as may be employed in an embodiment of amethod for performing video decoding in accordance with the presentinvention;

[0016]FIG. 11 is a block diagram illustrating an embodiment of a videodecoder subsystem coupled with a video display subsystem, such as mayemploy an embodiment of a method for performing video decoding inaccordance with the present invention;

[0017]FIG. 12 is a schematic diagram illustrating a scenario of spatialpositions of regularly interlaced video data with uniformly positionedfields, where lines in the bottom field are positioned half-way betweentwo lines in the top field, such as may be employed in an embodiment inaccordance with the invention;

[0018]FIG. 13 is a schematic diagram illustrating a scenario of spatialposition of regularly interlaced video data with non-uniformlypositioned fields, where lines in the bottom field are positioned{fraction (1/4)} of the way between two adjacent lines in the top field,such as may be employed in an embodiment in accordance with theinvention;

[0019]FIG. 14 is a schematic diagram illustrating a scenario of spatialposition of regularly interlaced video data with non-uniformlypositioned fields, where lines in the bottom field are positioned{fraction (1/8)} of the way between two adjacent lines in the top field,such as may be employed in an embodiment in accordance with theinvention;

[0020]FIG. 15 is a schematic diagram illustrating the results of aDDA-based vertical scaling operation for uniformly-positioned interlacedvideo source, such as may be employed in an embodiment in accordancewith the invention; and

[0021]FIG. 16 is a schematic diagram illustrating the results of aDDA-based vertical scaling operation for non-uniformly-positionedinterlaced video source, such as may be employed in an embodiment inaccordance with the invention.

DETAILED DESCRIPTION

[0022] In the following detailed description, numerous specific detailsare set forth in order to provide a thorough understanding of theinvention. However, it will be understood by those skilled in the artthat the present invention may be practiced without these specificdetails. In other instances, well-known methods, procedures, componentsand circuits have not been described in details so as not to obscure thepresent invention.

[0023] As previously indicated, conventionally, a video decode anddisplay system normally designed for a given maximum resolution willtypically not operate on bit streams that specify higher videoresolutions. Likewise, if a user chooses to view the video in a smallerwindow, downscaling the bit stream is conventionally achieved at displaytime and, therefore, full resolution decoding still occurs. Since fullresolution decoding followed by downscaling adds cost in the form ofadditional computation, additional memory, additional memory bandwidth,and complex downscaling at display time, it would be desirable ifdownscaling of a bit stream could be accomplished without fullresolution decoding.

[0024] Although the invention is not limited in scope in this respect,FIG. 7 is a schematic diagram illustrating an embodiment of a hardwaremotion compensation engine that may be employed to implement anembodiment of a method of performing video decoding in accordance withthe invention. For example, as shall be described in greater detailhereinafter, a three-dimensional (3D) pipeline may be employed toefficiently perform motion compensation, as illustrated in FIG. 8,although other hardware platforms other than a 3D pipeline may beemployed to implement embodiments of a method of performing videodecoding in accordance with an embodiment of the invention. FIGS. 7 and8 are described in more detail hereinafter.

[0025]FIG. 1 is a block diagram illustrating an embodiment ofconventional pipeline for performing video image decoding. FIG. 2 is ablock diagram illustrating an embodiment of an apparatus for performingvideo decoding in accordance with the present invention. In oneembodiment in accordance with the invention, as shall be described ingreater detail hereinafter, a compressed video image in the frequencydomain is downsampled at 230 and then inverse transformed at 220. Motioncompensation is performed on the downsampled image in the spatial domainat 210. Alternatively, as shall also be described in greater detailhereinafter, the compressed image in the frequency domain may be inversetransformed at 240 and then downsampled in the spatial domain and motioncompensated. Although the invention is not limited in scope in thisrespect, one example of a compressed video image in the frequency domaincomprises a discrete cosine transform (DCT) image. Likewise, althoughthe invention is not limited in scope in this respect, such a DCT imagemay comply with the MPEG2 specification, as shall be described ingreater detail hereinafter. In this context, although MPEG2 is referredto, including the aspect that divides an image into 16×16 macroblocks,the invention is not limited in scope to employing MPEG, includingMPEG2, to employing macroblocks of this particular size and shape oreven to employing macroblocks at all.

[0026] As illustrated in FIG. 2 and as shall be described in greaterdetail, the DCT image may be downsampled before being delivered to themotion compensation engine. Likewise, as indicated above, downsamplingmay be applied in this embodiment either before the inverse DCT, such asat 230, or after the inverse DCT, such as at 240, depending upon avariety of factors. In this particular embodiment, although theinvention is not limited in scope in this respect, the blocksillustrated in FIG. 2 prior to the vertical line are implemented insoftware and the blocks after the vertical line are implement inhardware. Conventionally, such video processing to accomplishdownsampling would be performed in hardware; however, an advantage of anembodiment of a method of performing video decoding in accordance withthe present invention is that it provides the capability to perform theprocessing in software due at least in part to greater processingefficiency in comparison with conventional approaches. Therefore, oneadvantage of this approach is that it provides greater flexibility. Insuch an embodiment, the decoder software may transfer the downsampledprediction error to the motion compensation hardware and the motionvectors may be adjusted substantially in accordance with thedownsampling ratio, as explained hereinafter. In this particularembodiment, although, again the invention is not limited in scope inthis respect, downsampling ratios of 1:1, 2:1, 4:1, and 8:1, alongeither of the horizontal, vertical or both directions, may be supported.In this particular embodiment, where MPEG2 is employed, the downsamplingratio is limited to no more than 8:1 due to the native eight-by-eightMPEG2 block size. However, this limitation may not apply in alternativeembodiments. Furthermore, in alternative embodiments, even for MPEG2,downsampling ratios other than a power of two may be implemented, suchas, for example, 3:1.

[0027] As illustrated in FIG. 2, the motion compensation hardware mayoperate directly on the downsampled bit stream. In this particularembodiment, where MPEG2 is employed, the downsampling ratio may be n,where n equals 1, 2, 4, and 8. In a motion compensation process, amotion vector of a processed macroblock specifies the relative distanceof reference data from the processed macroblock. Let (V_(x),V_(y))=(vector[r][0], vector[r][1]) be the original motion vector for amacroblock, where V_(x) and V_(y), the horizontal and verticalcomponents of the motion vector, are in the form of 16-bit signed value,although the invention is not limited in scope in this respect.According to the MPEG2 standard, the least significant bit (LSB) ofV_(x) and V_(y) is used to indicate the half-pixel resolution reference.Denote the whole pixel motion displacement for the luminance componentby (D_(x) ^(Y), D_(y) ^(Y))and the fractional offset for the luminancecomponent by (F_(x) ^(Y), F_(y) ^(Y)). Due to limited subpixelprecision, the fractional offset (F_(x) ^(Y), F_(y) ^(Y) may also becalled as the half-pixel offset flag. When there is no downsampling tothe bitstream, these may be calculated from the motion vector asfollows: $\begin{matrix}\left\{ {\begin{matrix}{{{D_{x}^{Y} = V_{x}}\operatorname{>>}1},} \\{{{D_{y}^{Y} = V_{y}}\operatorname{>>}1},}\end{matrix}{and}} \right. & \lbrack 1\rbrack \\\left\{ \begin{matrix}{{F_{x}^{Y} = {{V_{x}\&}1}},} \\{F_{y}^{Y} = {{V_{y}\&}1.}}\end{matrix} \right. & \lbrack 2\rbrack\end{matrix}$

[0028] where “>>” indicates a right shift operation and “&” indicates a“logic AND” operation. If F_(x) ^(Y) is set, or precisely is non-zero,horizontal interpolation, such as computing an average, may be appliedto the reference pixels. If F_(y) ^(Y) is set, vertical interpolation,such as computing an average, may be applied to the reference pixels. Ifboth are set, interpolations along both directions may be applied.

[0029] The chrominance motion displacement may also derived from thesame set of motion vector signal information. For YUV 4:2:0 color spaceformat, for example since the dimension of chrominance (Cb, Cr) picturesis half of that of the luminance component picture along both horizontaland vertical directions, the whole pixel displacement (D_(x) ^(C), D_(y)^(C)) and fractional offset (F_(x) ^(C), F_(y) ^(C)) for the chrominancecomponents of the processed macroblock may be determined as follows:$\begin{matrix}\left\{ {\begin{matrix}{{{D_{x}^{C} = \left( {V_{x}/2} \right)}\operatorname{>>}1},} \\{{{D_{y}^{C} = \left( {V_{y}/2} \right)}\operatorname{>>}1},}\end{matrix}{and}} \right. & \lbrack 3\rbrack \\\left\{ \begin{matrix}{{F_{x}^{C} = {{\left( {V_{x}/2} \right)\&}1}},} \\{{F_{y}^{C} = {{\left( {V_{y}/2} \right)\&}1}},}\end{matrix} \right. & \lbrack 4\rbrack\end{matrix}$

[0030] where symbol ‘/’ denotes regular integer division with truncationof the result toward zero. Notice that in this example the chrominancefractional offset is also in half chrominance pixel resolution.

[0031] Ignoring the luminance and chrominance superscripts of terms(D_(x), D_(y)) and (F_(x), F_(y)), the motion prediction operation maybe, in one embodiment, implemented with simple adders and shifters asthe following pseudo-code illustrates. MC_Prediction(p, q) { if( Fx !=1&& Fy !=1) /* full-pel prediction in both directions */ P[q][p] =R[n][m]; if( Fx ==1 && Fy !=1) /* full-pel vertical, half-pel horizontal*/ P[q][p] =(R[n][m] + R[n][m+1]) //2; if( Fx !=1 && Fy ==1) /* half-pelvertical, full pel horizontal */ P[q][p] = (R[n][m] + R[n+1][m]) //2;if( Fx ==1 && Fy ==1) /* half-pel prediction in both directions */P[q][p] = (R[n][m] +R[n][m+1] + R[n+1][m] + R[n+1][m+1]) //4; }// endMc_Prediction(p, q)

[0032] In this example, the division symbol “//” denotes rounding up tothe next larger integer (rounding away from zero). Symbols p and qrepresent integer indices in the destination image along horizontal andvertical directions, respectively. Symbols m and n represent integerindices in the reference image along horizontal and vertical directions,respectively. The reference pixel location (m, n) may be derived fromthe motion vector displacement term (D_(x), D_(y)).

[0033] In this particular embodiment of the invention, themotion-compensated prediction applied to the downsampled bit stream isperformed directly using the downsampled reference images and theoriginal motion vectors decoded from the compressed bitstream. Themotion vectors used in the prediction may also be specified by themotion displacement (D_(x) ^(Y), D_(y) ^(Y)), (D_(x) ^(C), D_(y) ^(C))and motion fractional offset (F_(x) ^(Y), F_(y) ^(Y)), (F_(x) ^(C),F_(y) ^(C)) with reference to the downsampled image. Contrary toconventional motion fractional offset that is only a one bit value inMPEG2, as previously described, more precision is preserved for (F_(x)^(Y), F_(y) ^(Y)), (F_(x) ^(C), F_(y) ^(C)) in a downsampling operationin this particular embodiment in accordance with the invention.Consequently, the simple averaging operation described above may bereplaced by more accurate interpolation operations. In one embodiment,for example, a bilinear interpolation unit may be used in the motionprediction calculation of motion compensation, although the invention isnot limited in scope in this respect. The video or image reconstructionquality may also be improved by using a higher order interpolation unit.A bilinear interpolator typically employs more hardware than anaveraging based interpolator. However, it is a common feature that maybe provided as part of a state-of-the-art graphics controller hardware.For example, it may be found in the texture pipeline of athree-dimensional (3D) rendering engine or an image processor for imagescaling or filtering. In one embodiment, therefore, as illustrated inFIG. 8, a 3D pipeline may include a bilinear interpolator, designated820 and 830, such as one having a 6-bit interpolation phase value, asshown by FIG. 10. In such an embodiment, although the invention is notlimited in scope in this respect, motion displacement and motionfractional offset may be calculated from motion vectors, decoded fromthe compressed bitstream as follows: $\begin{matrix}\left\{ {\begin{matrix}{{D_{x}^{Y} = V_{x}}\operatorname{>>}{SubD}_{x}} \\{{D_{y}^{Y} = V_{y}}\operatorname{>>}{SubD}_{y}}\end{matrix}\quad {and}\quad \left\{ {\begin{matrix}{{D_{x}^{C} = \left( {V_{x}/2} \right)}\operatorname{>>}{SubD}_{x}} \\{{D_{y}^{C} = \left( {V_{y}/2} \right)}\operatorname{>>}{SubD}_{y}}\end{matrix}{and}} \right.} \right. & \lbrack 5\rbrack \\\left\{ {\begin{matrix}{F_{x}^{Y} = {\left( {{V_{x}\&}{FMaskD}_{x}} \right){\operatorname{<<}{SubR}_{x}}}} \\{F_{y}^{Y} = {\left( {{V_{y}\&}{FMaskD}_{y}} \right){\operatorname{<<}{SubR}_{y}}}}\end{matrix}\quad {and}\quad \left\{ {\begin{matrix}{F_{x}^{C} = {\left( {{\left( {V_{x}/2} \right)\&}{FMaskD}_{x}} \right){\operatorname{<<}{SubR}_{x}}}} \\{F_{y}^{C} = {\left( {{\left( {V_{y}/2} \right)\&}{FMaskD}_{y}} \right){\operatorname{<<}{SubR}_{y}}}}\end{matrix}.} \right.} \right. & \lbrack 6\rbrack\end{matrix}$

[0034] For these relationships and this embodiment, values for thesubsampled displacement shifts SubD_(x) and SubD_(y), the subsampledfractional masks FMaskD_(x), FMaskD_(y), and the subsampled bilinearinterpolation phase shifters SubR_(x), SubR_(y), based at least in parton the downsampling ratio, may be determined. These are provided inTable 1, below, for a system with 6-bit interpolation phase value range.It will be appreciated that the values for a system with a differentinterpolation precision may, likewise, be derived as desired. It willalso be appreciated the corresponding interpolation parameters for asystem with a different interpolation filter other than a bilinearinterpolation filter may also be derived as desired. TABLE 1 Variablesthat are used to set the bilinear interpolation parameters fordownsampling. Downsampling Ratio 1:1 2:1 4:1 8:1 SubD_(x) or SubD_(y) 12 3 4 FMaskD_(x) or FMaskD_(y) 0 × 01 0 × 03 0 × 07 0 × 0F SubR_(x) orSubR_(y) 5 4 3 2

[0035] With the above equations defining the motion displacement andmotion fractional values, the motion-compensated prediction may bedescribed for this embodiment by the following bilinear interpolationrelation:

P[q][p]={(0x40−F _(y))·[(0x40−F _(x))·R[n][m]+F _(x) ·R[n][m+1]]+F_(y)·[(0x40−F _(x))·R[n+1][m]+F _(x) ·R[n+1][m+1]]}//0x80

={(0x40−F _(y))·[(0x40−F _(x))·R[q+D _(y) ][p+D _(x) ]+F _(x) ·R[q+D_(y) ][p+D _(x)+1]]+F _(y)·[(0x40−F _(x))·R[q+D _(y)+1][p+D _(x) ]+F_(x) ·R[q+D _(y)+1][p+D _(x)+1]]}//0x80.  [7]

[0036] where, the reference pixel location (m, n) is derived from themotion vector displacement term (D_(x), D_(y)).

[0037]FIG. 9 illustrates the spatial relationship of the four referencepixels contributing to the prediction of the reconstructed pixel forthis embodiment. As previously indicated, one embodiment of a bilinearinterpolator is illustrated in FIG. 10. In this embodiment, bilinearinterpolator 1001 is formed by two linear interpolators, 1020 and 1030,that operate along one spatial direction followed by another linearinterpolator, 1040, that operates along the orthogonal direction. Theoutput signal from linear interpolator 1040 passes through a roundingand saturation unit 1050 that converts the output signal to a specifiedfinite precision form, although the invention is not limited in scope inthis respect, and this is just one example of a bilinear interpolatorembodiment. Furthermore, the invention is not limited in scope toemploying bilinear interpolation.

[0038] For this embodiment, the above mentioned motion compensationoperation may be implemented with a hardware motion compensation system,such as the one, 701, illustrated in FIG. 7, although, again, theinvention is not limited in scope in this respect. Here, the operationis applied on a macroblock basis, however, as previously indicated, thisis merely a feature of MPEG2 and alternative embodiments are possible.The operation of this particular embodiment shall now be described indetail. Command parser and address generator unit 810 receives motioncompensation instructions for a given macroblock and generatesdestination addresses and transmits the destination addresses tocorrection data memory interface unit 815 and destination data memoryinterface unit 825. Correction data memory interface unit 815 usesthis(these) destination address(es) to load correction data from acorrection data buffer(not shown). Destination data memory interfaceunit 825 uses this(these) destination address(es) to send the finaloutput data from the motion compensation engine to the destinationbuffer(not shown). Command parser and address generator unit 810 alsogenerates a prediction address (or addresses) in the reference pictureor image using information about the current macroblock and its motionvectors and sends this to reference data memory interface unit 835. Thereference data memory interface unit uses this to load data from aforward reference buffer, or from a backward reference buffer, or fromboth a forward reference buffer and a backward reference buffer(notshown).

[0039] The command parser and address generator unit also generatessubpixel fractional information to be applied to the bilinearinterpolation units, 820 and 830. Of these two bilinear interpolationunits, one performs forward prediction and one performs backwardprediction. Here, each bilinear interpolation unit uses the fractionalinformation to interpolate data from the reference buffer. It isconceivable that these two bilinear interpolation units may beimplemented as a single hardware unit. In the case of a single hardwarebilinear interpolation unit is implemented, this bilinear interpolationunit may be used sequentially if forward and backward bi-directionalprediction is desired.

[0040] The output signals from the forward bilinear interpolation unitand the backward bilinear interpolation unit are added together incombine predictions unit 850. The combine predictions unit performsproper scaling and saturation to the data, such as according to acompression standard, such as, for example, MPEG2. The output signalfrom the combine predictions unit is then sent to prediction correctionunit 860 and the correction data are added to the motion prediction dataand final output data, for this embodiment, are generated. The outputdata from the prediction corrections unit is then sent to memory by thedestination data memory interface.

[0041] As illustrated in FIG. 8, the above mentioned embodiment ofmotion compensation implementation may be implemented using existing 3Drendering hardware that is currently a common feature in graphicscontroller hardware. The boxes in dotted lines map the motioncompensation aspects of this embodiment just described into some 3Dhardware units, for illustration purposes. Of course, other hardwaremapping and hardware reusing are also possible and may now beimplemented by one of ordinary skill in the art. In this particularembodiment, the reference buffers are mapped as texture buffers.Therefore, the texture memory and texture cache may be used to obtainthe reference data load from memory. After that, the 3D texture pipelinethat typically contains bilinear interpolators or even tri-linearinterpolators may be used to perform the bilinear interpolation andprediction combination operations in motion compensation. Then, the 3Dtexture blend unit may be used to perform the prediction correctionoperation. The 3D color and destination memory interface unit may beused to write the output signals of motion compensation to memory.

[0042] Several embodiments where MPEG2 coding has been employed shall bedescribed. As previously explained, the invention is not limited inscope to these particular embodiments. Any one of a number of othervideo or image coding specifications and/or storage formats may beemployed. Nonetheless, these embodiments are provided as examples ofimplementations of a method of performing video image decoding inaccordance with the present invention. In this context, three maincategories of MPEG2 coding types shall be described. One coding typecomprises a frame image with frame prediction or frame motioncompensation employed. In this context, the term frame image or frametype refers to a progressive sequence display of data signals for animage, such as is commonly employed on computer platforms havingmonitors. The term frame prediction or frame motion compensation refersto a particular format for the prediction error and for the motionvectors that have been coded or produced by an encoder. It is desirableto know the format in which this signal information is encoded in thebitstream in order to perform decoding to reconstruct the image thatproduced this signal information. Therefore, if frame prediction orframe decoding is employed, then the prediction error is stored in aframe format, analogous to the format employed for a frame image. Asecond coding type comprises a field image with field motioncompensation or field prediction. The term field image or field typegenerally refers to a technique commonly employed for television sets ortelevision set displays in which half of the image is displayedseparately at a rate that allows the human eye to merge the images. Inthis format, field data lines, that is, lines of signal data from afield image, are stored in an interlaced format. Therefore, top fieldand bottom field lines are alternated or interlaced within a frame ofsignal data. The term field motion compensation or field predictionrefers to the format in which the prediction error and motion vectorsare stored in which prediction may be predicated upon the so-called topfields or bottom fields independently. In a field encoded image, the topand bottom fields are each encoded as separate images, and thendisplayed in an interlaced format. The motion prediction data for thetop and bottom fields in this case is based in part on recently decodedfields. A third MPEG2 coding type employed in this context comprises aframe image with field motion compensation or field prediction. In thisformat, both fields are encoded as a single image, but the motioncompensation data for each of its two fields is based in part onpreviously decoded fields. In MPEG2, this third format has twovariations. In one variation, such as illustrated in FIG. 5, theluminance DCT data is encoded on a frame basis, while in the othervariation, such as illustrated in FIG. 6, the luminance DCT data isstored on a field basis. This coding type is between the two codingtypes mentioned above in that both formats may be interspersed on amacroblock basis. More specifically, on a macroblock basis, data signalsmay be stored as a frame image with either field or frame prediction.

[0043] Because these particular embodiments relate to a DCT domaindownsampling implementation for MPEG2 coding types, downsampling andmotion compensation that is applied to the vertical direction will beemployed. The horizontal direction in a video frame is handled similarlyfor the MPEG2 coding types described above, and therefore, in thisembodiment, the horizontal direction is handled in a similar manner asthe approach described below for a frame image with frame prediction,although, in a particular implementation of video image decoding inaccordance with the present invention, this aspect may vary. Further,the illustrations given herein illustrate the technique for luminancecomponent only. Nevertheless, an extension of this technique, oncedescribed, to handle the chrominance component of MPEG is within theability of one of ordinary skill in the art. Further, in otherapplications with multiple components encoded in the bitstream, such as,but not limited to, RGB encoded JPEG images, the extension of thetechnique described herein to each of the components is within theability of one of ordinary skill in the art.

[0044]FIG. 3 is a schematic diagram of an embodiment of a method forperforming video image decoding in accordance with the present inventionin which a DCT image that complies with the MPEG2 specification isemployed. In this particular embodiment, a frame image with frame motioncompensation, as described above, is the MPEG2 coding type employed.FIG. 3 illustrates two 8×8 luminance blocks in a macroblock in whichdownsampling in the DCT domain occurs. Column 1 illustrates spatialpositioning for data lines of the two blocks prior to downsampling.Column 2 illustrates spatial locations for the data lines afterdownsampling. Therefore, column 2 illustrates the effect on the datapositioning of downsampling in the DCT domain and then performing theinverse DCT. Likewise, as FIG. 3 illustrates, column 3 illustratesdownsampling for a ratio of 4:1, as opposed to a ratio of 2:1 for column2. As shown in FIG. 3, the downsampled lines are uniformly distributedin space after downsampling and inverse transforming. This would occurin this embodiment in a similar way for the downsampled pixels in thehorizontal direction. Therefore, the downsampled frame image and framemotion vectors may be in a manner similar to the approach applied to theoriginal image. The result of downsampling is to convert the 16×16macroblocks and the 8×8 blocks they contain to smaller blocks. Forexample, after 2:1 horizontal subsampling and 4:1 vertical subsampling,each 8×8 block is decoded into a 4×2 block, and each 16×16 macroblock isdecoded into an 8×4 macroblock. Thus, motion compensation for any givendownsampled block, such as blocks with size 4×4, 4×2, 2×4, 2×2, or 1×1in this embodiment, may be directly conducted on the downsampledreferences using scaled motion vectors employing, in this particularembodiment, the technique described previously. Therefore, although theinvention is not limited in scope in this respect, the previouslydescribed motion compensation hardware may be efficiently employed toperform this signal processing operation.

[0045]FIG. 4 is a diagram illustrating an embodiment of a method forperforming video image decoding in accordance with the present inventionin which another MPEG2 coding type is employed. In this particularembodiment, a field image with field motion compensation is employed, asdescribed above. Considering the nature of the two temporally separatedfields for one frame, field based downsampling may introduce spatialaliasing and/or a non-uniform positioning of the lines from the twofields. The non-uniform positioning that may result is illustrated inFIG. 4 in which, again, downsampling has been applied, and then theinverse DCT, to illustrate the effect on this coding type. However, thenon-uniform line spacing does not affect motion vectors. Likewise,adjustments to the line positioning illustrated in FIG. 4, such as forthe prediction error, may be accomplished using interpolation techniquessuch as bilinear interpolation. Again, the 3D hardware pipelinepreviously described may be employed to implement these interpolations.Therefore, in this particular embodiment, motion compensation, as wellas the spatial positioning of the downsampled blocks, should include theexact line positioning for each field.

[0046] In another embodiment, instead of employing the approachillustrated in FIG. 4, which produced non-uniform vertical line spacing,selected taps may be employed for the top field and bottom field linesto produce a downsampled image that is uniformly spaced in the verticaldirection. For example, although the invention is not limited in scopein this respect, two spatial filters, one respectively for each of thebottom and top fields, may be employed. In addition, a similar approachmay alternatively be employed in the frequency domain, such as the DCTdomain. Where it is employed in the frequency domain, the transformeddata signals may be phase shifted, rather than spatially shifted. Therelation of a spatial shift and its corresponding transform domainoperation may be derived using convolution property of the particulartransform.

[0047]FIGS. 5 and 6 each illustrate portions of embodiments of a methodfor performing video image decoding in accordance with the presentinvention for an MPEG2 coding type described as a frame image with fieldmotion compensation. FIG. 5 illustrates application of a portion of anembodiment of a method for performing video decoding in accordance withthe invention to a macroblock stored in this format as a frame type withfield prediction and frame DCT. In contrast, FIG. 6 illustratesapplication of a portion of an embodiment of a method for performingvideo decoding in accordance with the present invention to a macroblockstored in this format as a frame type with field prediction and fieldDCT. It may be convenient to convert the image data and prediction ormotion compensation data to one format, either frame or field. Likewise,conversion to a frame format may generally involve temporal filtering,which might involve a modification of the previously illustrated 3Dpipeline hardware. However, of course, the invention is not limited inscope in this respect and this approach may be employed with a hardwarepipeline, for example, that includes this feature. In this particularembodiment, however, operations are performed to place the frame data ina field format, and to place the frame motion compensation data into afield motion compensation format. Each field is then processedseparately in the spatial domain to accomplish motion compensation, inthis particular embodiment.

[0048] One modification, then, for this particular embodiment is toconvert a frame downscaled macroblock into a field downscaledmacroblock. In this particular embodiment, as illustrated in FIG. 5,this is accomplished by reconstruction of the blocks in a macroblock atfull vertical resolution in the spatial domain by inverse transformationfrom the DCT domain, interlacing the block into two fields anddownscaling it vertically in the spatial domain. Therefore, for thisembodiment, the vertical downscaling is effectively moved to afterperforming the inverse DCT, as illustrated in FIG. 1. Likewise, motioncompensation is performed on each field separately, as mentioned above.If the motion compensation were frame based, then, in this embodiment,the prediction error could be converted to field based using thetechnique illustrated. To convert frame motion vectors to field based,the frame motion vector may be employed for each of the top and bottomfield motion vectors. A difference between the embodiments illustratedin FIG. 5 and FIG. 6 is whether the macroblock is stored as a framemacroblock or a field macroblock. As previously discussed, if it isstored as a frame macroblock, then interleaving is performed asillustrated in FIG. 5. In contrast, as illustrated in FIG. 6, if themacroblock is stored as a field macroblock, then interleaving isperformed, as illustrated, and the data lines may be processed aspreviously described for an interleaved field format.

[0049] An aspect of an embodiment in accordance with the invention isthe downscaling of a video image in the frequency domain, such as anMPEG2 image in the DCT domain, although the invention is not limited inscope in this respect. This may be discussed by referring toone-dimensional (1D) signals. The results for 2D signals would be anextension of this approach due to the separability of operations.Likewise, the case of 2:1 downscaling will be discussed asrepresentative of other downscaling ratios. In general, implementingdownscaling in the frequency domain is well-known and there are manywell-known ways to accomplish it. The invention is not restricted inscope to a particular approach and this discussion is provided as onlyone example.

[0050] The filtering of finite digital signals in the sample domain isperformed using convolution. A well-known circular convolution may beobtained, for example, by a periodic extension of the signal and filter.It may be efficiently performed in the discrete Fourier transform (DFT)domain by simple multiplication of the discrete Fourier transforms ofthe signal and filter and then applying the inverse DFT to the result.For the DCT, a convolution may be applied that is related to, butdifferent from the DFT convolution. This is described, for example, in“Symmetric Convolution and the Discrete Sine and Cosine Transforms,” byS. Martucci, IEEE Transactions on Signal Processing, Vol. 42, No. 5, May1994, and includes a symmetric extension of the signal and filter,linear convolution, and applying a window to the result. For example,assuming that the signal is represented as s(n), n=0, . . . , N−1, andits corresponding transform (DCT domain) coefficients is represented asS(u), u=0, . . . , N−1, and the filter is represented as h(m), m=0, . .. , M−1, then the DCT may be represented in matrix form as S=C*s, withs, S being column vector form of the signal and its DCT coefficients andC being the DCT matrix, as follows:

C _(u,n)=(2/N)^(1/2) k(u)cos[π(u(2n+1)/2N)], where u,n=0, . . . ,N−1  [8]

[0051] where

k(u)=

1/{square root}{square root over (2)}, where u=0

1, u=1, . . . , N−1  [9]

[0052] Assume a symmetric low pass even length filter h(m) with filterlength M, where M=2*N, the DCT coefficients H(u) for the filter may beobtained by applying the convolutional form described above to the righthalf of the filter, which is equivalent to multiplication of the righthalf coefficients by the transform matrix:

D_(u,m)=2 cos[πu(2m+1)/2n], where u,m=1, . . . , N−1  [10]

[0053] The filtering is then performed by element-by-elementmultiplication of the signal DCT coefficients and the filter DCTcoefficients and taking the appropriate inverse DCT transform of theDCT-domain multiplication results:

Y(u)=S(u)*H(u), where u=0 . . . , N−1  [11]

[0054] Not only filtering, but also downsampling, may be performed inthe DCT domain. For downsampling by two, the result of theelement-by-element multiplication is folded across the middle half pointand subtracted and after that scaled by 1/12. Mathematically, this isillustrated as:

[Y(u)−Y(N−u)]/{square root}{square root over (12)}, where u=0, . . .(N/2)−1  [12]

[0055] The decimated signal is then obtained by applying the inverse DCTtransform of the length N/2. There are several special cases that mightbe usefully applied in this embodiment, although the invention is notlimited in scope in this respect. For example, a brickwall filter withcoefficients [11110000] in the DCT domain may be implemented that canfurther simplify the DCT domain downsampling by two operation.Specifically, the special filter shape avoids folding and addition.Another filter with coefficients [1 1 1 1 0.5 0 0 0] provides atransform function of an antialising filter for the downsampling by twooperation. Other filters may also be employed, of course.

[0056] Likewise, it will be appreciated that in this particularembodiment, a low pass, linear interpolation filter has been implementedto perform the downsampling; nonetheless, the invention is not limitedin scope in this respect. For example, linear filters other than lowpass filters or, alternatively, non-linear filters, such as, forexample, a median filter, an adaptive edge-enhancement filter may beemployed. It will, of course, be appreciated that some linear filtersmay effectively be implemented using motion compensation hardware andbilinear interpolation, although the invention is not limited in scopein this respect.

[0057] Filtering may also be applied after motion compensation ordownsampling. More specifically, variations in clarity of the resultingimages may become apparent to the human eye, particularly as the imagesare viewed in sequence. In some embodiments, it may be desirable tosmooth these variations or, alternatively, enhance the images havingless clarity. Therefore, any one of a number of filters, linear ornon-linear, may be applied. For example, an edge enhancement image maybe applied, although the invention is not limited in scope in thisrespect. Again, it will be appreciated that some linear filters may beeffectively implemented using a 3D hardware pipeline and bilinearinterpolation.

[0058] Of course, as previously indicated, the invention is notrestricted in scope to the embodiments previously described. Forexample, in an alternative embodiment, where a 3D hardware pipeline isemployed to implement a bilinear interpolation operation, a 3×3, 4×4, orgreater interpolation operation may be implemented in place of a 2×2bilinear interpolation operation. Likewise, in another alternativeembodiment, as greater computational resources are demanded by thedecoder in order to keep up with the video bit stream being provided orreceived, the decoder may be adapted to downsample at higher ratios inorder to allow graceful degradation in the quality of the imagesprovided. Likewise, the decoder may be adapted to perform the reverse aswell.

[0059] In another embodiment, instead of downsampling all video images,the decoder may be adapted to downsample only some of the video images.For example, specific images may be selected for downsampling, such asby transmitting a signal indication, or the decoder may be adapted todownsample a subset of the received video images based at least in parton a predetermined criteria, such as, as one example, decoding I and Pframes at full resolution while subsampling B frames. Therefore, any oneof a number of approaches may be employed and the invention is notrestricted in scope to any particular approach.

[0060] Another aspect of an embodiment in accordance with the inventionis the display of the decoded video images that are downsampled in thefrequency domain, such as an MPEG2 image in the DCT domain, although theinvention is not limited in scope in this respect. In this particularembodiment, the video decoder subsystem discussed above is coupled to avideo display subsystem, as illustrated in FIG. 10. Both the videodecoder subsystem and the video display subsystem may couple with thememory subsystem, where decoded video images may reside. As illustratedin FIG. 10, in the memory subsystem, the decoded video images arelabeled as video buffer 1, video buffer 2 and so on. The number n ofdecoded video images may be chosen according to the video decoder andvideo display subsystems. In such an embodiment, besides typicalinformation, such as the decoded image size (X, Y), the video decodersubsystem may couples with the video display subsystem with additionalsignals, such as the Picture Type (PICT) and the vertical subsamplingfactor (VSFF), that relate to the transform-domain downsamplingoperation. Signals such as PICT and VSFF may be used to adjust the videodisplay subsystem to properly display the decoded video images that aredownsampled in the transform domain using an embodiment in accordancewith the invention.

[0061] The video display subsystem handles displaying the decoded videoimages on the screen. The size of the desired display video window maynot be the same as the source video image. In this case, the sourcevideo may be scaled up or down to match the display window size,corresponding to the process of interpolation and decimation,respectively. Quality scaling involves proper filtering of the sourcevideo data to reduce aliasing artifacts. In one approach, a finiteimpulse response (FIR) filter, where only finite number of input pixelscontributes to a particular output pixel, is an example, of a scalingfilter implemented in the video display subsystem. A filter for spatialscaling of a video signal is normally a 2-dimensional (2D) function. Inpractice, separable filters may be used to reduce the hardwarecomplexity and cost. In other words, the scaling of a video signal isapplied to the vertical and horizontal directions independently. In thefollowing, the vertical scaling operation is addressed since it isrelevant to the uniform and non-uniform field scan line distributionthat the proposed video decoder generates.

[0062] For a given source size N_(src) and a destination size N_(dest),the forward scaling factor (in contrary to the backward scaling factorthat we will define later) is defined as the ratio of the source sizeover destination size: $\begin{matrix}{S_{f} = {\frac{N_{src}}{N_{dest}}.}} & \lbrack 13\rbrack\end{matrix}$

[0063] Denoting the source sampling step as unity, we can define a DDA(Digital Differential Analyzer) value for a given output line as therelative position to the source line vertical positions. Normally, a DDAaccumulator contains a fixed-point value. The integer portion of the DDAvalue, denoted by int(DDA), indicates the closest source line number,while the fractional portion of the DDA value, denoted by fract(DDA),corresponds to the relative distance from that source line. The initialphase of a scaling operation is defined as the initial value of the DDAaccumulator (DDA₀=DDA(0)) that is associated with the first output linefrom the scaling filter. Then the sample position of a succeeding outputline may be described by the DDA value accumulated by the scalingfactor.

DDA(n)=DDA(n−1)+S _(f), for n=1, N_(dest)−1,  [14]

[0064] where n is the index to the output video lines.

[0065] For a source video image that is created by the above mentionedvideo decoder subsystem and is in a frame type with the transform domaindownsampling as illustrated in FIG. 3, its display is similar to thenon-downsampled video image, although the scaling factor is different.

[0066] For a source video image that is created by the above mentionedvideo decoder subsystem and is in a field type with the transform domaindownsampling but with uniformly distributed scan lines as illustrated inFIG. 12, again, its display method is similar to the non-downsampledfield video image, although the scaling factor is different. However,for a source video image that is created by the above mentioned videodecoder subsystem and is in a field type with the transform domaindownsampling but with non-uniformly distributed scan lines, asillustrated in FIG. 13 and FIG. 14, the conventional field video displaymethod cannot be applied to this kind of video images. Instead, propervertical position adjustment is employed to display the top and bottomfields of the transform-domain downsampled video images correctly.

[0067] Let the distance between two adjacent lines in a field to be 1unit. As illustrated in FIG. 12, for the non-downsampled field-typevideo image, the first line in the bottom field (line 1) is 0.5 unitbelow the first line in the top field (line 0). This is also true forthe subsequent lines in the top and bottom fields. The results of aDDA-based vertical scaling operation for uniformly-positioned interlacedvideo source are illustrated in FIG. 15. The example shows the upscalingfactor of 3:8. FIG. 15(a) is the case of scaling from the top field withan initial phase of DDA[0]=0.0, and FIG. 15(b) is the case of scalingfrom the bottom field with an initial phase of DDA[0]=−0.5.

[0068] When vertical downsampling by two is performed in the transformdomain, the first line in bottom field (line 1) is 0.25 units below thefirst line in the top field (line 0) as illustrated in FIG. 13. FIG. 16illustrates the results of aDDA-based vertical scaling operation fornon-uniformly-positioned interlaced video source. The example shows theupscaling factor of 3:8. FIG. 16(a) is the case of scaling from the topfield with an initial phase of DDA[0]=0.0, and FIG. 16(b) is the case ofscaling from the bottom field with an initial phase of DDA[0]=−0.25.

[0069] Similarly, FIG. 14 illustrates that the first line in the bottomfield is 0.125 units below the fist line in the top field when avertical downsampling by four is performed in the transform domain.

[0070] It will, of course, be understood that, although a particularembodiment has just been described, the invention is not limited inscope to a particular embodiment or implementation. For example, oneembodiment may be in hardware, whereas another embodiment may be insoftware. Likewise, an embodiment may be in firmware, or any combinationof hardware, software, or firmware, for example. Likewise, although theinvention is not limited in scope in this respect, one embodiment maycomprise an article, such as a storage medium. Such a storage medium,such as, for example, a CD-ROM, or a disk, may have stored thereoninstructions, which when executed by a system, such as a computer systemor platform, or an imaging system, may result in a method of performingvideo image decoding in accordance with the invention, such as, forexample, one of the embodiments previously described.

[0071] While certain features of the invention have been illustrated asdescribed herein, many modifications, substitutions, changes andequivalents will now occur to those skilled in the art. It is,therefore, to be understood that the appended claims are intended tocover all such embodiments and changes as fall within the true spirit ofthe invention.

1. A method of performing video image decoding comprising: downsamplinga compressed video image in the frequency domain; inverse transformingthe downsampled video image; and performing motion compensation for thedownsampled image in the spatial domain.
 2. The method of claim 1,wherein the compressed video image in the frequency domain comprises adiscrete cosine transform (DCT) image.
 3. The method of claim 2, whereinthe DCT image is stored as a DCT image that complies with an MPEGspecification.
 4. The method of claim 3, wherein the DCT image is storedas a frame type image.
 5. The method of claim 4, wherein the motioncompensation data signals are stored as frame prediction type motioncompensation.
 6. The method of claim 3, wherein the DCT image is storedas a field type image.
 7. The method of claim 6, wherein the motioncompensation data signals are stored as field prediction type motioncompensation.
 8. The method of claim 1, and further comprisingdisplaying the downsampled spatial image so that resulting non-uniformvertical spacing of data signal lines in the downsampled spatial imageappear substantially uniform on a computer monitor.
 9. The method ofclaim 1, wherein the downsampling is performed using an integer ratio.10. The method of claim 1, wherein performing motion compensationcomprises scaling motion vectors in accordance with the downsampingratio.
 11. The method of claim 10, wherein motion vector scalingcomprises implementing an interpolation operation.
 12. The method ofclaim 11, wherein motion vector scaling comprises implementing abilinear interpolation operation.
 13. The method of claim 12, whereinthe bilinear interpolation operation is implemented on 3D pipelinehardware.
 14. The method of claim 1, wherein downsampling comprisesimplementing a linear filter as a bilinear interpolation operation. 15.The method of claim 14, wherein the bilinear interpolation operation isimplemented on 3D pipeline hardware.
 16. A method of performing videoimage decoding comprising: inverse transforming a compressed videoimage; downsamping the inverse transformed image in the spatial domain;and performing motion compensation for the downsampled image in thespatial domain.
 17. The method of claim 16, wherein the compressed videoimage comprises a discrete cosine transform (DCT) image.
 18. The methodof claim 17, wherein the DCT image is stored as a DCT image thatcomplies with an MPEG specification.
 19. The method of claim 18, whereinthe DCT image comprises macroblocks stored as frame macroblocks andmacroblocks stored as field macroblocks. 20 The method of claim 19, andfurther comprising: converting the frame macroblocks to fieldmacroblocks prior to downsampling in the spatial domain.
 21. The methodof claim 19, wherein the motion compensation data signals are stored asfield prediction type motion compensation.
 22. The method of claim 16,wherein performing motion compensation comprises scaling motion vectorsin accordance with a downscaling ratio.
 23. The method of claim 22,wherein motion vector scaling comprises implementing an interpolationoperation.
 24. The method of claim 23, wherein motion vector scalingcomprises implementing a bilinear interpolation operation.
 25. Themethod of claim 24, wherein the bilinear interpolation operation isimplemented on 3D pipeline hardware.
 26. The method of claim 16, whereindownsampling comprises implementing a linear filter as a bilinearinterpolation operation.
 27. The method of claim 26, wherein thebilinear interpolation operation is implemented on 3D pipeline hardware.28. An article comprising: a storage medium, having stored thereoninstructions, that when executed by a platform, result in the following:downsampling a compressed video image in the frequency domain; inversetransforming the downsampled video image; and performing motioncompensation for the downsampled image in the spatial domain.
 29. Thearticle of claim 28, wherein the instructions, when executed furtherresult in the compressed video image in the frequency domain comprisinga discrete cosine transform (DCT) image.
 30. The article of claim 29,wherein the instructions, when executed, further result in the DCT imagebeing stored as a DCT image that complies with an MPEG specification.31. The article of claim 28, wherein the instructions, when executed,further result in: displaying the downsampled spatial image so thatresulting non-uniform vertical spacing of data signal lines in thedownsampled spatial image appear substantially uniform on a computermonitor.
 32. An article comprising: a storage medium, having storedthereon instructions, that when executed by a platform, result in thefollowing: inverse transforming a compressed video image; downsampingthe inverse transformed image in the spatial domain; and performingmotion compensation for the downsampled image in the spatial domain. 33.The article of claim 32, wherein the instructions, when executed furtherresult in the compressed video image in the frequency domain comprisinga discrete cosine transform (DCT) image.
 34. The article of claim 33,wherein the instructions, when executed, further result in the DCT imagebeing stored as a DCT image that complies with an MPEG specification.